This is a document describing Kara’s initial exploration of the data from Packets 1-3.

These packets included 29 scales or subscales, including the following:

Means by site

First, let’s compare the mean responses to each subscale across the 5 sites. I’ll plot each subscale in a separate mini-plot (denoted in the title of each mini-plot). The x-axis and color denote the site (US, Ghana, Thailand, China, or Vanuatu). Note the the range y-axis varies between mini-plots. The dot shows what the mean response was for that subscale for that site, and the error bars show a 95% confidence interval for that mean.

Joining, by = c("ctry", "packet", "subscale")

Cluster analysis

… of sites

Now let’s try a formal analysis for determining how similar these 5 sites are to each other.

Hierarchical clustering basically works like this: If there are 5 things - A, B, C, D, and E - this analysis will try to group pairs of like things together. E.g., first it might pair A and D together, then it will pretend that there are just 4 things - AD, B, C, and E. Then it might pair B and C together, and then pretend that there are just 3 things - AD, BC, and E. Then it might pair AD and BC together, and then pretend that there are just 2 things: ADBC and E. It will do this until there is just one “thing” left.

So to read this plot, look for the pairings. Sites that are on the same “branch” in this “dendrogram” showed similar patterns of means on the subscales. The closer together they are on that branch, the more similar they were.

… of subscales

Now let’s do the same thing for the subscales: Which subscales “hang together”?

To read this plot, again, look for the pairings. Subscales that are on the same “branch” in this “dendrogram” showed similar patterns of means across the 5 sites. The closer together they are on that branch, the more similar they were.

Look at correlations among subscales by site

Now let’s take a closer look at which subscales seem to “hang together,” by looking at the correlations between scales in their means for each site.

To read this plot, find one scale on the x-axis, and another scale on the y-axis. The value (and color) at that point in the grid shows you the correlation between these two subscale means across the 5 sites. The maximum possible correlation is +1 (red), and the minimum is -1 (blue).

I ordered the scales according to the cluster analysis we just did above, so you can see “patches” of scales that all tended to have similar patterns of means across sites.

Look at correlations among subscales by individuals

We could do the same thing thinking about individual participants instead of sites - but it’s important to keep in mind that most people didn’t fill out all 29 subscales! So we’ll focus on just looking at how the subscales within each packet (Packet 1, 2, or 3) “hang together” for the people who completed that packet.

As above, to read these plots, find one scale on the x-axis, and another scale on the y-axis. The value (and color) at that point in the grid shows you the correlation between these two subscale means across the 5 sites. The maximum possible correlation is +1 (red), and the minimum is -1 (blue). (Note that these scales are being presented in alphabetical order, not by any sort of clustering analysis.)

Packet 1

Setting row names on a tibble is deprecated.

Packet 2

Setting row names on a tibble is deprecated.

Packet 3

Setting row names on a tibble is deprecated.

All packets

Just for fun, here’s a look at the correlations among individual participants’ subscale scores across all scales. Note that some of these pairs of subscales probably have very few observations going into these correlations!! So take this with a grain of salt.

Setting row names on a tibble is deprecated.

---
title: 'SC data entry: Packets 1-3'
subtitle: 'KW initial exploration (updated with data from 2018-01-31)'
output:
  html_notebook: default
  html_document:
    df_print: paged
  pdf_document: default
---

```{r, include = FALSE}
knitr::opts_chunk$set(echo = FALSE, message = FALSE)
```

```{r, include = FALSE}
# set working director
# setwd("/Users/kweisman/Documents/Research (Stanford)/Projects/Templeton Grant/DATA WRANGLING/templeton_packets/packets123/")

# load packages
library(tidyverse)
library(rms)
library(ggdendro)

# load question key (including manual reverse-coding)
question_key <- read.csv("//Users/kweisman/Documents/Research (Stanford)/Projects/Templeton Grant/DATA WRANGLING/templeton_packets/packets123/packets123_question_key_byhand.csv")

# load data
d_wide <- read_csv("./packets123_data_byquestion_wide.csv")
d_wide_subscale <- read_csv("./packets123_data_bysubscale_wide.csv")
d_long <- read_csv("./packets123_data_byquestion_long.csv")
d_long_subscale <- read_csv("./packets123_data_bysubscale_long.csv")

# make custom functions
round2 <- function(x) {format(round(x, 2), digits = 2)}
```

This is a document describing Kara's initial exploration of the data from Packets 1-3. 

These packets included `r length(levels(factor(d_long_subscale$subscale))) - 1` scales or subscales, including the following:

- **Absorption** (Tellegen); range: 0 to 34
- **Absorption extra** (a handful of extra questions extra questions); range: 0 to 4
- **Daily spiritual experiences #1-4** (Underwood & Teresi); range: 0 to 20
- **Daily spiritual experiences #5-14** (Underwood & Teresi); range: 0 to 50
- **Daily spiritual experiences #5-14 (Thai)** (additional questions asked only in Thailand); range: 0 to 60
- **Daily spiritual experiences #15-16** (Underwood & Teresi); range: 0 to 10
- **Daily spiritual experiences #15-16 (Thai)** (additional questions asked only in Thailand); range: 0 to 30
- **Spiritual events** (Luhrmann); range: 0 to 88
- **Sensory seeking** (Brown et al.); range: -28 to 28
- **Body awareness** (Shields et al.); range: -36 to 36
- **Attention to feelings** (Salovey et al.); range: -42 to 42
- **Hallucination** (Alderson-Day); range: 0 to 27
- **VISQ: dialogic speech** (McCarthy-Jones & Fernyhough); range: -8 to 8
- **VISQ: inner speech** (McCarthy-Jones & Fernyhough); range: -10 to 10
- **VISQ: evaluative/motivational speech** (McCarthy-Jones & Fernyhough); range: -8 to 8
- **Inner speech** (Hardy/Bentall); range: -20 to 20
- **Hearing events** (Posey & Losch); range: 0 to 18
- **Encoding style** (Lewicki); range: 0 to 40
- **Mind metaphors** (Van Elk); range: -16 to 16
- **TAT: lack of cognitive confidence** (Wells et al.); range: -12 to 12
- **TAT: positive beliefs re: worrying** (Wells et al.); range: -12 to 12
- **TAT: cognitive self-consciousness** (Wells et al.); range: -12 to 12
- **TAT: uncontrollability/danger** (Wells et al.); range: -12 to 12
- **TAT: need to control thoughts** (Wells et al.); range: -12 to 12
- **Dualism: mental states** (Weisman); range: 0 to 8
- **Dualism: life events** (Weisman); range: 0 to 5
- **Dualism: inanimate consciousness** (Weisman); range: 0 to 6
- **Dualism: minds, selves, & world** (Weisman); range: 0 to 9
- **Dualism: epistemology** (Weisman); range: 0 to 5

# Means by site

First, let's compare the mean responses to each subscale across the 5 sites. I'll plot each subscale in a separate mini-plot (denoted in the title of each mini-plot). The x-axis and color denote the site (US, Ghana, Thailand, China, or Vanuatu). Note the the range y-axis varies between mini-plots. The dot shows what the mean response was for that subscale for that site, and the error bars show a 95% confidence interval for that mean. 

```{r}
d_long_subscale_boot <- d_long_subscale %>%
  filter(!is.na(sum_score)) %>%
  group_by(ctry, packet, subscale) %>%
  do(data.frame(rbind(smean.cl.boot(.$sum_score)))) %>%
  ungroup() %>%
  filter(subscale != "attn") %>%
  left_join(d_long_subscale %>%
              filter(!is.na(sum_score)) %>%
              count(ctry, packet, subscale)) %>%
  mutate(packet = paste("packet", packet),
         ctry = factor(ctry,
                       levels = c("us", "ghana", 
                                  "thailand", "china", "vanuatu")),
         subscale = 
           factor(subscale,
                  levels = c("exwl", 
                             "exwl_extra",
                             "dse_1to4", 
                             "dse_5to14", 
                             "dse_5to14_thai",
                             "dse_15to16", 
                             "dse_15to16_thai",
                             "spev", 
                             "sen_sensory_seeking", 
                             "sen_body_awareness", 
                             "sen_trait_metamood",
                             "her2_hallucination",
                             "invo_VISQ_dialogic_speech", 
                             "invo_VISQ_inner_speech",
                             "invo_VISQ_eval_motiv_inner_speech",
                             "invo_hardy_bentall",
                             "her_posey_losch",
                             "enco_lewicki",
                             "meta_van_elk",
                             "tat_confidence",
                             "tat_positive_beliefs",
                             "tat_cognitive",
                             "tat_uncontrollability",
                             "tat_need_control",
                             "minw_mental_states",
                             "minw_life_events",
                             "minw_inanimate",
                             "minw_selves_souls_world",
                             "minw_epistemic"),
                  labels = c("absorption (tellegen)", 
                             "absorption (extra)",
                             "daily spiritual experiences (#1-4)",
                             "daily spiritual experiences (#5-14)",
                             "daily spiritual experiences (#5-14 thai)",
                             "daily spiritual experiences (#15-16)",
                             "daily spiritual experiences (#15-16 thai)",
                             "spiritual events", 
                             "sensory seeking",
                             "body awareness", 
                             "attention to feelings",
                             "hallucination", 
                             "VISQ: dialogic speech",
                             "VISQ: inner speech",
                             "VISQ: evaluative",
                             "inner speech",
                             "hearing events",
                             "encoding style",
                             "mind metaphors",
                             "TAT: lack of cognitive confidence",
                             "TAT: positive beliefs re: worrying",
                             "TAT: cognitive self-consciousness",
                             "TAT: uncontrollability/danger",
                             "TAT: need to control thoughts",
                             "dualism: mental states",
                             "dualism: life events",
                             "dualism: inanimate consciousness",
                             "dualism: minds, selves, & world",
                             "dualism: epistemology")))
```

```{r, fig.width = 4, fig.asp = 4}
ggplot(d_long_subscale_boot %>%
         mutate(subscale = 
                  factor(subscale,
                         labels = 
                           c("absorption\n(tellegen)\nrange: 0 to 34",
                             "absorption\n(extra)\nrange: 0 to 4",
                             "daily spiritual experiences\n(underwood & teresi; #1-4)\nrange: 0 to 20",
                             "daily spiritual experiences\n(underwood & teresi; #5-14)\nrange: 0 to 50",
                             "daily spiritual experiences\n(add'l thai versions for #5-14)\nrange: 0 to 60",
                             "daily spiritual experiences\n(underwood & teresi; #15-16)\nrange: 0 to 10",
                             "daily spiritual experiences\n(add'l thai versions for #15-16)\nrange: 0 to 30",
                             "spiritual events\n(luhrmann)\nrange: 0 to 88",
                             "sensory seeking\n(brown et al.)\nrange: -28 to 28",
                             "body awareness\n(shields et al.)\nrange: -36 to 36",
                             "attention to feelings\n(salovey et al.)\nrange: -42 to 42",
                             "hallucination\n(alderson-day)\nrange: 0 to 27",
                             "VISQ: dialogic speech\n(mccarthy-jones & fernyhough)\nrange: -8 to 8",
                             "VISQ: inner speech\n(mccarthy-jones & fernyhough)\nrange: -10 to 10",
                             "VISQ: evaluative/motivational speech\n(mccarthy-jones & fernyhough)\nrange: -8 to 8",
                             "inner speech\n(hardy/bentall)\nrange: -20 to 20",
                             "hearing events\n(posey & losch)\nrange: 0 to 18",
                             "encoding style\n(lewicki)\nrange: 0 to 40",
                             "mind metaphors\n(van elk)\nrange: -16 to 16",
                             "TAT: lack of cognitive confidence\n(wells et al.)\nrange: -12 to 12",
                             "TAT: positive beliefs re: worrying\n(wells et al.)\nrange: -12 to 12",
                             "TAT: cognitive self-consciousness\n(wells et al.)\nrange: -12 to 12",
                             "TAT: uncontrollability/danger\n(wells et al.)\nrange: -12 to 12",
                             "TAT: need to control thoughts\n(wells et al.)\nrange: -12 to 12",
                             "dualism: mental states\n(weisman)\nrange: 0 to 8",
                             "dualism: life events\n(weisman)\nrange: 0 to 5",
                             "dualism: inanimate consciousness\n(weisman)\nrange: 0 to 6",
                             "dualism: minds, selves, & world\n(weisman)\nrange: 0 to 9",
                             "dualism: epistemology\n(weisman)\nrange: 0 to 5"))) %>%
         # filter(!grepl("thai", subscale)) %>% # get rid of thai-only scales
         mutate(packet = gsub("packet ", "P", packet)),
       aes(x = ctry, y = Mean, color = ctry)) +
  facet_wrap(~ reorder(interaction(packet, subscale, sep = ": "),
                       as.numeric(factor(packet))),
             ncol = 3, scales = "free") +
  geom_pointrange(aes(ymin = Lower, ymax = Upper)) +
  geom_text(aes(label = paste0("(n=", n, ")"), y = Lower), 
            size = 2, nudge_x = 0.15, hjust = 0) +
  scale_x_discrete(expand = c(0, 1)) +
  scale_color_brewer(palette = "Dark2") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1),
        legend.position = "top") +
  labs(title = "mean subscale scores by site",
       subtitle = "ordered by packet (P1, P2, P3)\nerror bars are bootstrapped 95% confidence intervals",
       x = "site", color = "site",
       y = "mean subscale score (range varies by subscale)")
```

# Cluster analysis

```{r}
cor_by_site <- d_long_subscale_boot %>%
  filter(!grepl("thai", subscale)) %>% 
  mutate(subscale = ifelse(subscale == "inner speech",
                           paste(as.character(subscale),
                                 gsub("packet ", "p", packet),
                                 sep = "_"),
                           as.character(subscale))) %>%
  distinct(ctry, subscale, Mean) %>%
  filter(subscale != "inner voice") %>%
  ungroup() %>%
  spread(subscale, Mean) %>%
  data.frame() %>%
  column_to_rownames("ctry")
```

## ... of sites

Now let's try a formal analysis for determining how similar these 5 sites are to each other. 

Hierarchical clustering basically works like this: If there are 5 things - A, B, C, D, and E - this analysis will try to group pairs of like things together. E.g., first it might pair A and D together, then it will pretend that there are just 4 things - AD, B, C, and E. Then it might pair B and C together, and then pretend that there are just 3 things - AD, BC, and E. Then it might pair AD and BC together, and then pretend that there are just 2 things: ADBC and E. It will do this until there is just one "thing" left.

So to read this plot, look for the pairings. Sites that are on the same "branch" in this "dendrogram" showed similar patterns of means on the subscales. The closer together they are on that branch, the more similar they were.

```{r, fig.width = 2, fig.asp = 0.6}
clust_sites <- hclust(dist(cor_by_site))
ggdendrogram(clust_sites) +
  labs(title = "hierarchical clustering of sites",
       subtitle = "using mean subscale scores by site")
```

## ... of subscales

Now let's do the same thing for the subscales: Which subscales "hang together"? 

To read this plot, again, look for the pairings. Subscales that are on the same "branch" in this "dendrogram" showed similar patterns of means across the 5 sites. The closer together they are on that branch, the more similar they were.

```{r, fig.width = 4, fig.asp = 0.6}
clust_subscales <- hclust(dist(t(cor_by_site)))
ggdendrogram(clust_subscales) +
  labs(title = "hierarchical cluster of subscales",
       subtitle = "using mean subscale scores by site")
```

# Look at correlations among subscales by site

Now let's take a closer look at which subscales seem to "hang together," by looking at the correlations between scales in their means for each site. 

To read this plot, find one scale on the x-axis, and another scale on the y-axis. The value (and color) at that point in the grid shows you the correlation between these two subscale means across the 5 sites. The maximum possible correlation is +1 (red), and the minimum is -1 (blue).

I ordered the scales according to the cluster analysis we just did above, so you can see "patches" of scales that all tended to have similar patterns of means across sites.

```{r, fig.width = 7, fig.asp = 1}
d_long_subscale_boot %>%
  filter(!grepl("thai", subscale)) %>% 
  mutate(subscale = ifelse(subscale == "inner speech",
                           paste(as.character(subscale),
                                 gsub("packet ", "p", packet),
                                 sep = "_"),
                           as.character(subscale))) %>%
  distinct(ctry, subscale, Mean) %>%
  ungroup() %>%
  spread(subscale, Mean) %>%
  data.frame() %>%
  column_to_rownames("ctry") %>%
  cor(use = "pairwise.complete.obs") %>%
  # corrplot version:
  # corrplot::corrplot(method = "color", tl.col = "black",
  #                    addCoef.col = "black", order = "hclust",
  #                    col = RColorBrewer::brewer.pal(n = 11, name = "PRGn"),
  #                    title = "\ncorrelations between subscale means (by country), ordered by hierarchical clustering")

  # ggplot version:
  data.frame() %>%
  rownames_to_column("subscaleA") %>%
  mutate(subscaleA_order = as.numeric(factor(as.numeric(factor(subscaleA)),
                                             levels = clust_subscales$order))) %>%
  gather(subscaleB, cor, -subscaleA, -subscaleA_order) %>%
  mutate(subscaleB_order = as.numeric(factor(as.numeric(factor(subscaleB)),
                                             levels = clust_subscales$order))) %>%
  ggplot(aes(x = reorder(subscaleA, desc(subscaleA_order)), 
             y = reorder(subscaleB, subscaleB_order), 
             fill = cor, label = round2(cor))) +
  geom_tile() +
  geom_text(size = 3) +
  scale_fill_distiller(guide = guide_colorbar(barheight = 50),
                       palette = "RdYlBu", limits = c(-1, 1)) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(title = "correlations among mean subscale scores, by site",
       subtitle = "using pairwise complete observations\nordered via hierarchical clustering analysis",
       fill = "correlation\ncoeff. (r)",
       x = "",
       y = "")
```

# Look at correlations among subscales by individuals

We could do the same thing thinking about individual participants instead of sites - but it's important to keep in mind that most people didn't fill out all `r length(levels(factor(d_long_subscale$subscale))) - 1` subscales! So we'll focus on just looking at how the subscales within each packet (Packet 1, 2, or 3) "hang together" for the people who completed that packet.

As above, to read these plots, find one scale on the x-axis, and another scale on the y-axis. The value (and color) at that point in the grid shows you the correlation between these two subscale means across the 5 sites. The maximum possible correlation is +1 (red), and the minimum is -1 (blue). (Note that these scales are being presented in alphabetical order, *not* by any sort of clustering analysis.)

## Packet 1

```{r fig.width = 3, fig.asp = 1}
d_long_subscale %>%
  filter(packet == 1, !is.na(sum_score)) %>%
  spread(subscale, sum_score) %>%
  select(-c(ctry, wher, recr, whoc, attn)) %>%
  mutate(subj = paste(subj, packet, version, sep = "_")) %>%
  select(-packet, -version) %>%
  distinct() %>%
  remove_rownames() %>%
  column_to_rownames("subj") %>%
  cor(use = "pairwise.complete.obs") %>%
  # corrplot version:
  # corrplot::corrplot(method = "color", tl.col = "black",
  #                    addCoef.col = "black", order = "hclust",
  #                    col = RColorBrewer::brewer.pal(n = 11, name = "PRGn"),
  #                    title = "\nPACKET 1: correlations among subscales")

  # ggplot version:
  data.frame() %>%
  rownames_to_column("subscaleA") %>%
  gather(subscaleB, cor, -subscaleA) %>%
  ggplot(aes(x = subscaleA, y = subscaleB, fill = cor, label = round2(cor))) +
  geom_tile() +
  geom_text(size = 3) +
  scale_fill_distiller(guide = guide_colorbar(barheight = 6),
                       palette = "RdYlBu", limits = c(-1, 1)) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(title = "PACKET 1: correlations among subscales",
       subtitle = "using pairwise complete observations",
       x = "",
       y = "")
```

## Packet 2

```{r fig.width = 3, fig.asp = 1}
d_long_subscale %>%
  filter(packet == 2, !is.na(sum_score)) %>%
  spread(subscale, sum_score) %>%
  select(-c(ctry, wher, recr, whoc, attn)) %>%
  mutate(subj = paste(subj, packet, version, sep = "_")) %>%
  select(-packet, -version) %>%
  distinct() %>%
  remove_rownames() %>%
  column_to_rownames("subj") %>%
  cor(use = "pairwise.complete.obs") %>%
  # corrplot version:
  # corrplot::corrplot(method = "color", tl.col = "black",
  #                    addCoef.col = "black", order = "hclust",
  #                    col = RColorBrewer::brewer.pal(n = 11, name = "PRGn"),
  #                    title = "\nPACKET 2: correlations among subscales")

  # ggplot version:
  data.frame() %>%
  rownames_to_column("subscaleA") %>%
  gather(subscaleB, cor, -subscaleA) %>%
  ggplot(aes(x = subscaleA, y = subscaleB, fill = cor, label = round2(cor))) +
  geom_tile() +
  geom_text(size = 3) +
  scale_fill_distiller(guide = guide_colorbar(barheight = 6),
                       palette = "RdYlBu", limits = c(-1, 1)) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(title = "PACKET 2: correlations among subscales",
       subtitle = "using pairwise complete observations",
       x = "",
       y = "")
```

## Packet 3

```{r fig.width = 4, fig.asp = 1}
d_long_subscale %>%
  filter(packet == 3, !is.na(sum_score)) %>%
  spread(subscale, sum_score) %>%
  select(-c(ctry, wher, recr, whoc, attn)) %>%
  mutate(subj = paste(subj, packet, version, sep = "_")) %>%
  select(-packet, -version) %>%
  distinct() %>%
  remove_rownames() %>%
  column_to_rownames("subj") %>%
  cor(use = "pairwise.complete.obs") %>%
  # corrplot version:
  # corrplot::corrplot(method = "color", tl.col = "black",
  #                    addCoef.col = "black", order = "hclust",
  #                    col = RColorBrewer::brewer.pal(n = 11, name = "PRGn"),
  #                    title = "\nPACKET 3: correlations among subscales")

  # ggplot version:
  data.frame() %>%
  rownames_to_column("subscaleA") %>%
  gather(subscaleB, cor, -subscaleA) %>%
  ggplot(aes(x = subscaleA, y = subscaleB, fill = cor, label = round2(cor))) +
  geom_tile() +
  geom_text(size = 3) +
  scale_fill_distiller(guide = guide_colorbar(barheight = 6),
                       palette = "RdYlBu", limits = c(-1, 1)) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(title = "PACKET 3: correlations among subscales",
       subtitle = "using pairwise complete observations",
       x = "",
       y = "")
```

## All packets

Just for fun, here's a look at the correlations among individual participants' subscale scores across all scales. **Note that some of these pairs of subscales probably have very few observations going into these correlations!! So take this with a grain of salt.**

```{r, fig.width = 6, fig.asp = 1}
d_long_subscale %>%
  spread(subscale, sum_score) %>%
  select(-c(ctry, wher, recr, whoc, attn)) %>%
  mutate(subj = paste(subj, packet, version, sep = "_")) %>%
  select(-packet, -version) %>%
  distinct() %>%
  remove_rownames() %>%
  column_to_rownames("subj") %>%
  cor(use = "pairwise.complete.obs") %>%
  data.frame() %>%
  rownames_to_column("subscaleA") %>%
  gather(subscaleB, cor, -subscaleA) %>%
  ggplot(aes(x = subscaleA, y = subscaleB, fill = cor, label = round2(cor))) +
  geom_tile() +
  geom_text(size = 3) +
  scale_fill_distiller(guide = guide_colorbar(barheight = 6),
                       palette = "RdYlBu", limits = c(-1, 1)) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(title = "PACKETS 1-3: correlations among subscales",
       subtitle = "using pairwise complete observations",
       x = "",
       y = "")
```
